Probability is the expression of belief in some future outcome
A random variable can take on different values with different probabilities
The sample space of a random variable is the universe of all possible values
The sample space can be represented by a
Describes the expected outcome of a single event with probability p
Example of flipping of a fair coin once
\[Pr(X=\text{Head}) = \frac{1}{2} = 0.5 = p \]
\[Pr(X=\text{Tails}) = \frac{1}{2} = 0.5 = 1 - p \]
\[ p + (1-p) = 1 \]
\[ Pr(\text{X=H and Y=H}) = p*p = p^2 \] \[ Pr(\text{X=H and Y=T}) = p*p = p^2 \] \[ Pr(\text{X=T and Y=H}) = p*p = p^2 \] \[ Pr(\text{X=T and Y=T}) = p*p = p^2 \]
H and T can occur in any order\[ \text{Pr(H and T) =} \] \[ \text{Pr(X=H and Y=T) or Pr(X=T and Y=H)} = \] \[ (p*p) + (p*p) = 2p^{2} \]
Joint probability
\[Pr(X,Y) = Pr(X) * Pr(Y)\]
Conditional probability
\[Pr(Y|X) = Pr(Y)\text{ and }Pr(X|Y) = Pr(X)\]
\[Pr(Y|X) \neq Pr(Y)\text{ and }Pr(X|Y) \neq Pr(X)\]
A binomial distribution results from the combination of several independent Bernoulli events
\[\large f(k) = {n \choose k} p^{k} (1-p)^{n-k}\]
n is the total number of trialsk is the number of successesp is the probability of successq is the probability of not successp = 1-q\[\large p^{k} (1-p)^{n-k}\]
\[\large {n \choose k}\]
Another common situation in biology is when each trial is discrete but number of observations of each outcome is observed
Pr(Y=r) is the probability that the number of occurrences of an event y equals a count r in the total number of trials\[Pr(Y=r) = \frac{e^{-\mu}\mu^r}{r!}\]
\[Pr(y=r) = \frac{e^{-\lambda}\lambda^r}{r!}\]
where \[\large \pi \approx 3.14159\]
\[\large \epsilon \approx 2.71828\]
To write that a variable (v) is distributed as a normal distribution with mean \(\mu\) and variance \(\sigma^2\), we write the following:
\[\large v \sim \mathcal{N} (\mu,\sigma^2)\]
Estimate of the mean from a single sample
\[\Large \bar{x} = \frac{1}{n}\sum_{i=1}^{n}{x_i} \]
Estimate of the variance from a single sample
\[\Large s^2 = \frac{1}{n-1}\sum_{i=1}^{n}{(x_i - \bar{x})^2} \]
\[\huge z_i = \frac{(x_i - \bar{x})}{s}\]
What is the probability that we would reject a true null hypothesis?
What is the probability that we would accept a false null hypothesis?
How do we decide when to reject a null hypothesis and support an alternative?
What can we conclude if we fail to reject a null hypothesis?
What parameter estimates of distributions are important to test hypotheses?
\[\huge t = \frac{(\bar{y}_1-\bar{y}_2)}{s_{\bar{y}_1-\bar{y}_2}} \]
where
which is the calculation for the standard error of the mean difference
\[ Power \propto \frac{(ES)(\alpha)(\sqrt n)}{\sigma}\]
Power is proportional to the combination of these parameters
General Linear Model (GLM) - two or more continuous variables
General Linear Mixed Model (GLMM) - a continuous response variable with a mix of continuous and categorical predictor variables
Generalized Linear Model - a GLMM that doesn’t assume normality of the response
Generalized Additive Model (GAM) - a model that doesn’t assume linearity
All an be written in the form
response variable = intercept + (explanatory_variables) + random_error
in the general form:
\[ Y=\beta_0 +\beta_1*X_1 + \beta_2*X_2 +... + \epsilon\]
where \(\beta_0, \beta_1, \beta_2, ....\) are the parameters of the linear model
\[H_0 : \beta_0 = 0\] \[H_0 : \beta_1 = 0\]
full model - \(y_i = \beta_0 + \beta_1*x_i + error_i\)
reduced model - \(y_i = \beta_0 + 0*x_i + error_i\)
\[\beta_{YX}=\rho_{YX}*\sigma_Y/\sigma_X\] \[b_{YX} = r_{YX}*S_Y/S_X\]
\[y_i = \beta_0 + \beta_1 * x_I + \epsilon_i\]
To develop a better predictive model than is possible from models based on single independent variables.
To investigate the relative individual effects of each of the multiple independent variables above and beyond the effects of the other variables.
The individual effects of each of the predictor variables on the response variable can be depicted by single partial regression lines.
The slope of any single partial regression line (partial regression slope) thereby represents the rate of change or effect of that specific predictor variable (holding all the other predictor variables constant to their respective mean values) on the response variable.
Additive model \[y_i = \beta_0 + \beta_1x_{i1} + \beta_2x_{i2} + ... + B_jx_{ij} + \epsilon_i\]
Multiplicative model (with two predictors) \[y_i = \beta_0 + \beta_1x_{i1} + \beta_2x_{i2} + B_3x_{i1}x_{i2} + \epsilon_i\]
From Langford, D. J.,et al. 2006. Science 312: 1967-1970
In words:
stretching = intercept + treatment
\[Z_{ik} = c_1y_{i1} + c_2y_{i1} + c_3y_{i2} + c_1y_{i3} + ... + c_py_{ip}\]